A concave regularization technique for sparse mixture models
نویسندگان
چکیده
Latent variable mixture models are a powerful tool for exploring the structure in large datasets. A common challenge for interpreting such models is a desire to impose sparsity, the natural assumption that each data point only contains few latent features. Since mixture distributions are constrained in their L1 norm, typical sparsity techniques based onL1 regularization become toothless, and concave regularization becomes necessary. Unfortunately concave regularization typically results in EM algorithms that must perform problematic non-concave M-step maximizations. In this work, we introduce a technique for circumventing this difficulty, using the so-called Mountain Pass Theorem to provide easily verifiable conditions under which the M-step is well-behaved despite the lacking concavity. We also develop a correspondence between logarithmic regularization and what we term the pseudo-Dirichlet distribution, a generalization of the ordinary Dirichlet distribution well-suited for inducing sparsity. We demonstrate our approach on a text corpus, inferring a sparse topic mixture model for 2,406 weblogs.
منابع مشابه
Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation
In this paper a fast method for large-scale sparse inversion of magnetic data is considered. The L1-norm stabilizer is used to generate models with sharp and distinct interfaces. To deal with the non-linearity introduced by the L1-norm, a model-space iteratively reweighted least squares algorithm is used. The original model matrix is factorized using the Golub-Kahan bidiagonalization that proje...
متن کاملA superlinearly convergent R-regularized Newton scheme for variational models with concave sparsity-promoting priors
A general class of variational models with concave priors is considered for obtaining certain sparse solutions, for which nonsmoothness and non-Lipschitz continuity of the objective functions pose significant challenges from an analytical as well as numerical point of view. For computing a stationary point of the underlying variational problem, a Newton-type scheme with provable convergence pro...
متن کاملAsymptotic Equivalence of Regularization Methods in Thresholded Parameter Space
High-dimensional data analysis has motivated a spectrum of regularization methods for variable selection and sparse modeling, with two popular methods being convex and concave ones. A long debate has taken place on whether one class dominates the other, an important question both in theory and to practitioners. In this article, we characterize the asymptotic equivalence of regularization method...
متن کاملConvex-constrained Sparse Additive Modeling and Its Extensions
Sparse additive modeling is a class of effective methods for performing high-dimensional nonparametric regression. In this work we show how shape constraints such as convexity/concavity and their extensions, can be integrated into additive models. The proposed sparse difference of convex additive models (SDCAM) can estimate most continuous functions without any a priori smoothness assumption. M...
متن کاملEfficient Learning of Sparse Gaussian Mixture Models of Protein Conformational Substates
Molecular Dynamics (MD) simulations are an important technique for studying the conformational dynamics of proteins in Computational Structural Biology. Traditional methods for the analysis of MD simulation assumes a single conformational state underlying the data. With recent developments in MD simulation technologies, MD simulation now can produce massive and long time-scale trajectories acro...
متن کامل